Distance-Based Kernels for Real-Valued Data
نویسندگان
چکیده
We consider distance-based similarity measures for real-valued vectors of interest in kernel-based machine learning algorithms. In particular, a truncated Euclidean similarity measure and a self-normalized similarity measure related to the Canberra distance. It is proved that they are positive semi-definite (p.s.d.), thus facilitating their use in kernel-based methods, like the Support Vector Machine, a very popular machine learning tool. These kernels may be better suited than standard kernels (like the RBF) in certain situations, that are described in the paper. Some rather general results concerning positivity properties are presented in detail as well as some interesting ways of proving the p.s.d. property.
منابع مشابه
یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملA Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels
Non-linear kernel methods can be approximated by fast linear ones using suitable explicit feature maps allowing their application to large scale problems. To this end, explicit feature maps of kernels for vectorial data have been extensively studied. As many real-world data is structured, various kernels for complex data like graphs have been proposed. Indeed, many of them directly compute feat...
متن کاملPattern recognition algorithms for symbol strings
Traditionally, pattern recognition has been concerned mostly with numerical data, i.e. vectors of real-valued features. Less often, symbolic representations of data have been used. A special category of data, symbol strings, have been neglected for a long time, partially because of a perceived lack of urgency and partially because of the high computational costs involved. Only recently, motivat...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملMultiple Fuzzy Regression Model for Fuzzy Input-Output Data
A novel approach to the problem of regression modeling for fuzzy input-output data is introduced.In order to estimate the parameters of the model, a distance on the space of interval-valued quantities is employed.By minimizing the sum of squared errors, a class of regression models is derived based on the interval-valued data obtained from the $alpha$-level sets of fuzzy input-output data.Then,...
متن کامل